Towards automatic retrieval of idioms in
نویسندگان
چکیده
The goal of this paper is to present a procedure for the automatic retrieval of idiomatic expressions from large text corpora. The procedure combines text segmentation techniques and Latent Semantic Analysis (Landauer, Foltz, Laham, 1998). Three indices were computed on the basis of the three-fold hypothesis that a) idiomatic expressions should have few neighbours, that b) idiomatic expressions should demonstrate low semantic proximity between the words composing them, and that c) idiomatic expressions should demonstrate low semantic proximity between the expression and the preceding and subsequent segments. The result of this procedure shows that we have not yet reached a fully automatic retrieval of idioms from large corpora, but this first trial has shown that we are on the way. The procedure reduces the amount of data to consider to less than a quarter (23.8%) of the original data, of which one fifth (20.9%) is idiomatic, and nearly 60% (58.8%) is phraseological in nature. In other words, this procedure drastically improves and facilitates hand-based retrieval. In addition, these first results already permit some linguistic exploitation of the retrieved idioms.
منابع مشابه
Attitude of Muslim Students towards English Idioms and Proverbs
This study aimed at investigating the attitude of Muslim students towards the use of certain English idioms and proverbs. Thirty Muslim students were asked to express their reactions and feelings towards two categories of English idioms and proverbs: the first category included idioms and proverbs containing the names of animals that are prohibited in Islam, and the second category contained cu...
متن کاملFuzzy Neighbor Voting for Automatic Image Annotation
With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...
متن کاملSemiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کاملAutomatic Colorization of Grayscale Images Using Generative Adversarial Networks
Automatic colorization of gray scale images poses a unique challenge in Information Retrieval. The goal of this field is to colorize images which have lost some color channels (such as the RGB channels or the AB channels in the LAB color space) while only having the brightness channel available, which is usually the case in a vast array of old photos and portraits. Having the ability to coloriz...
متن کاملThe Comparative Effect of Using Idioms in Conversation and Paragraph Writing on EFL Learners’ Idiom Learning
This study investigated the comparative effect of teaching idiomatic expressions through practicing them in conversation and paragraph writing on intermediate EFL learners’ idiom learning. The participants were sorted out of a population of 134 intermediate students in Zabansara Language School in Khorramabad based on their scores on a Preliminary English Test (PET) and an idiom test piloted in...
متن کامل